Downsample waterlevel trends to daily min depth-to-water#87
Merged
Conversation
High-frequency wells (e.g. continuous loggers, tens of thousands of readings) made the per-well Mann-Kendall test (O(n^2)) blow up memory and get the combine child process OOM-killed (ChildProcessCrashException, no Python traceback). Before the trend test, reduce each well's observations to one point per calendar day keeping the daily minimum depth-to-water (the shallowest reading), via _daily_min_series. This bounds the MK cost and removes within-day sampling noise. - record_count is now the daily-point count used for the trend; new observation_count carries the raw reading count. - qualification gate and classification unchanged (applied to daily points); updated TREND_METHOD_DESCRIPTION. - test: daily-min downsampling (same-day readings collapse to the min). 14 persister tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Your pull request is automatically being deployed to Dagster Cloud.
|
The trend combine rebuilt every observation into a ParameterRecord (and each site into a SiteRecord) before computing — millions of objects for statewide water-level data, on top of the already-large pickled inputs. Consume the payload dicts directly: dump_waterlevel_trend_collection and _daily_min_series now read dicts (obs.get / site.get) instead of getattr on record objects, and the combine asset passes all_sites/all_timeseries straight through. Cuts peak memory in the step that was OOM-crashing. Other dumpers (summary, major-chemistry, timeseries) still use record objects. Trend test helpers now build dicts. 14 persister tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
nm_waterlevel_trendscombine step crashed its child process at the native level —ChildProcessCrashExceptionwith no Python traceback (OOM kill). Root cause:pymannkendall.original_testis O(n²), and high-frequency wells (continuous loggers — tens of thousands of readings) blew up memory/CPU running the per-well test.Fix
Before the trend test, downsample each well's observations to one point per calendar day, keeping the daily minimum depth-to-water (the shallowest reading) —
_daily_min_series. This bounds the Mann-Kendall cost and removes within-day sampling noise. A well measured continuously for years collapses from ~10⁴–10⁵ points to ~10³ daily points.Changes
_daily_min_series(obs_list)— groups by UTC calendar day, keeps min DTW per day at the day's midnight epoch; returns(raw_count, sorted_daily_pairs).record_countis now the daily point count used for the trend; newobservation_countcarries the raw reading count.TREND_METHOD_DESCRIPTIONto document the daily-min step.observation_countvsrecord_count.Verification
14 persister tests pass. Independent of #86 (touches only
ogc_features.py+ its test).🤖 Generated with Claude Code